Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Py)Spark 3.0 / Java 11 fails with java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available #200

Closed
jonesberg opened this issue Jul 1, 2020 · 5 comments

Comments

@jonesberg
Copy link

jonesberg commented Jul 1, 2020

When using the connector with Spark 3.0.0 (on Java 11), trying to read a dataset from BigQuery fails with the error at the bottom.

This is a known problem with Java 9+ and Spark as shown on their documentation (https://spark.apache.org/docs/3.0.0/). Here is the pull request in question
apache/spark#26552

For Java 11, -Dio.netty.tryReflectionSetAccessible=true is required additionally for Apache Arrow library. This prevents java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available when Apache Arrow uses Netty internally.

My Spark instance is launched with the -Dio.netty.tryReflectionSetAccessible=true flags enabled and Pandas UDF/Arrow conversion are working. I downloaded a sample data set from BigQuery to test my code and it works without any issues.

Steps to reproduce

With Spark 3.0.0 / Java 11.
Launch PySpark with the --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" options.

Try to read a BigQuery dataset

df =   (spark.read.format("bigquery")
        .option("table", f"bigquery-public-data.noaa_gsod.gsod1929")
        .option("credentialsFile", "bq-key.json")
        .load())

df.show()

Stacktrace

20/07/01 11:53:14 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
	at com.google.cloud.spark.bigquery.repackaged.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
	at com.google.cloud.spark.bigquery.repackaged.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:257)
	at com.google.cloud.spark.bigquery.repackaged.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:247)
	at com.google.cloud.spark.bigquery.repackaged.io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:248)
	at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:88)
	at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:692)
	at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:68)
	at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:106)
	at com.google.cloud.spark.bigquery.ArrowReaderIterator.hasNext(ArrowBinaryIterator.java:111)
	at com.google.cloud.spark.bigquery.ArrowBinaryIterator.hasNext(ArrowBinaryIterator.java:63)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:488)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.agg_doAggregateWithoutKey_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
@medb
Copy link
Contributor

medb commented Jul 1, 2020

According to Netty it fixed compatibility with Java 11 in 4.1.30.Final version: netty/netty#8708

But connector uses 4.1.27.Final version:

"io.netty" % "netty-all" % "4.1.27.Final" % "provided",

We will update Netty version in the next connector release, meanwhile you can update Netty version locally and build connector jar by yourself.

@jonesberg
Copy link
Author

jonesberg commented Jul 2, 2020

Thanks for the quick update. I am unable to build the jar locally (and my knowledge of sbt-assembly is really not solid enough to dig this up). I'll test the next release!

@davidrabinowitz
Copy link
Member

Fixed by PR #204

@lucasdavid
Copy link

lucasdavid commented Dec 19, 2020

Should I still be experiencing this error, even though I downloaded the release spark-bigquery-with-dependencies_2.12-0.18.0?
@davidrabinowitz, did you manage to get it to work? Have you something else besides updating the version and adding -Dio.netty.tryReflectionSetAccessible=true?

@bzora-star
Copy link

export SPARK_SUBMIT_OPTS="--illegal-access=permit -Dio.netty.tryReflectionSetAccessible=true "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants