Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Could not initialize class ru.yandex.clickhouse.ClickHouseUtil when using with PySpark #138

Closed
vsmelov opened this issue Sep 20, 2017 · 4 comments

Comments

@vsmelov
Copy link

vsmelov commented Sep 20, 2017

When I dump PySpark DataFrame to ClickHouse containing columns with types Integer, DateTime, Float, Date all works fine.
But when I try to dump DataFrame with String-typed column I get error.

My PySpark code:

    df = spark.range(0, 10).withColumnRenamed('id', 'field_int')
    from pyspark.sql.functions import lit
    df = df.withColumn('field_str', lit('abcdef'))
    df.write.jdbc(url=config['ch_url'], table="test_with_string", mode="append",
                  properties=config["ch_properties"])

Error:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class ru.yandex.clickhouse.ClickHouseUtil
	at ru.yandex.clickhouse.ClickHousePreparedStatementImpl.setString(ClickHousePreparedStatementImpl.java:214)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeSetter$8.apply(JdbcUtils.scala:525)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeSetter$8.apply(JdbcUtils.scala:524)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:629)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:782)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	... 1 more

What can I do with it?
I am not Java developer so I don't know how to compile and "import" class ru.yandex.clickhouse.ClickHouseUtil to my project.

There are some files in compiled repository

vsmelov@vsmelov:~/PycharmProjects/etl/spark_test$ find /var/bigdata/clickhouse-jdbc/ -name '*ClickHouseUtil*'
/var/bigdata/clickhouse-jdbc/src/test/java/ru/yandex/clickhouse/ClickHouseUtilTest.java
/var/bigdata/clickhouse-jdbc/src/main/java/ru/yandex/clickhouse/ClickHouseUtil.java
/var/bigdata/clickhouse-jdbc/target/classes/ru/yandex/clickhouse/ClickHouseUtil.class
/var/bigdata/clickhouse-jdbc/target/test-classes/ru/yandex/clickhouse/ClickHouseUtilTest.class
/var/bigdata/clickhouse-jdbc/target/apidocs/ru/yandex/clickhouse/class-use/ClickHouseUtil.html
/var/bigdata/clickhouse-jdbc/target/apidocs/ru/yandex/clickhouse/ClickHouseUtil.html

But there is no .jar files
Thanks in advance.

@serebrserg
Copy link
Contributor

ClickhouseUtil is just a simple class as ClickHousePreparedStatementImpl, which is loaded correctly according to the error message. In fact they are in the same package, so they must be packaged and loaded the same way.

I'm not familiar with pyspark. Could you describe your settings on how you provide the clickhouse jdbc driver to your application? How do you get the driver?

I think the default way for you, considering #137 , is to download jar from https://mvnrepository.com/artifact/ru.yandex.clickhouse/clickhouse-jdbc/0.1.28 and add some needed dependencies jars. All classes should be loaded correctly this way.

@vsmelov
Copy link
Author

vsmelov commented Sep 29, 2017

Thanks!
It works when I download these jars

https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient/4.5.2 https://mvnrepository.com/artifact/org.apache.httpcomponents/httpcore/4.4.4 https://mvnrepository.com/artifact/ru.yandex.clickhouse/clickhouse-jdbc/0.1.28 https://mvnrepository.com/artifact/com.google.guava/guava/19.0 https://mvnrepository.com/artifact/net.jpountz.lz4/lz4/1.3.0 https://mvnrepository.com/artifact/joda-time/joda-time/2.9.3

CREATE TABLE test_insert (
field_int Int64, 
field_float Float64,
field_date Date,
field_datetime DateTime,
field_str String
) ENGINE = Memory;
    from pyspark.sql.functions import lit
    from datetime import datetime, date
    from pprint import pprint

    df = spark.range(0, 10).withColumnRenamed('id', 'field_int')
    df = df.withColumn('field_float', df['field_int'].cast('double'))
    df = df.withColumn('field_date', lit(date.today()))
    df = df.withColumn('field_datetime', lit(datetime.now()))
    df = df.withColumn('field_str', lit('abcdef'))
    df.write.jdbc(url=config['ch_url'], table="test_insert", mode="append",
                  properties=config["ch_properties"])

@vsmelov vsmelov closed this as completed Sep 29, 2017
@humbledeveloper43
Copy link

humbledeveloper43 commented Jan 11, 2018

I'm getting same exception in my java application. I added it to my pom.xml like this:

<dependency>
    <groupId>ru.yandex.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.1.34</version>
</dependency>

And I'm packaging my application with dependecies. And when I start the java -jar myapp.jar its throwing that exception. Any suggestion?

@gxlzlihao
Copy link

I have the same error, but my issue got solved after adding the following parameters to pyspark comman : --jars jars/clickhouse-jdbc-0.2.4.jar,jars/guava-19.0.jar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants