Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.4 Support #343

Closed
rbtrtr opened this issue Sep 18, 2023 · 13 comments · Fixed by #358
Closed

Spark 3.4 Support #343

rbtrtr opened this issue Sep 18, 2023 · 13 comments · Fixed by #358

Comments

@rbtrtr
Copy link

rbtrtr commented Sep 18, 2023

Is it planned to support spark 3.4.x ?

@luisvicenteatprima
Copy link

I have tested it on Databricks with Spark 3.4.1 and it works.

@hafizmujadidKhalid
Copy link

I tested it with spark 3.5 as well and it works fine.

@cerveada
Copy link
Collaborator

Unless Spark changed some of the APIs we use between versions, Abris will work fine. Thanks for reporting your tests, it's valuable information for us and for other Abris users.

@jelmew
Copy link

jelmew commented Jan 19, 2024

There is some incompatibility on spark 3.5. It seems some constructors have changed, giving Abris problems.


Caused by: java.lang.NoSuchMethodException: org.apache.spark.sql.avro.AvroDeserializer.<init>(org.apache.avro.Schema, org.apache.spark.sql.types.DataType, java.lang.String)
	at java.lang.Class.getConstructor0(Class.java:3082)
	at java.lang.Class.getConstructor(Class.java:1825)
	at org.apache.spark.sql.avro.AbrisAvroDeserializer$$anonfun$1.applyOrElse(AbrisAvroDeserializer.scala:38)
	at org.apache.spark.sql.avro.AbrisAvroDeserializer$$anonfun$1.applyOrElse(AbrisAvroDeserializer.scala:37)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at scala.util.Failure.recover(Try.scala:234)
	at org.apache.spark.sql.avro.AbrisAvroDeserializer.<init>(AbrisAvroDeserializer.scala:37)
	at za.co.absa.abris.avro.sql.AvroDataToCatalyst.deserializer$lzycompute(AvroDataToCatalyst.scala:71)
	at za.co.absa.abris.avro.sql.AvroDataToCatalyst.deserializer(AvroDataToCatalyst.scala:71)
	at za.co.absa.abris.avro.sql.AvroDataToCatalyst.nullSafeEval(AvroDataToCatalyst.scala:87)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
	at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1139)
	at com.databricks.photon.NativeRowBatchIterator.hasNext(NativeRowBatchIterator.java:44)
	at 0xa8e0d62 <photon>.HasNext(external/workspace_spark_3_5/photon/jni-wrappers/jni-row-batch-iterator.cc:50)
	at 0x5e136fb <photon>.OpenImpl(external/workspace_spark_3_5/photon/exec-nodes/file-writer-node.cc:166)
	at com.databricks.photon.JniApiImpl.open(Native Method)
	at com.databricks.photon.JniApi.open(JniApi.scala)
	at com.databricks.photon.JniExecNode.open(JniExecNode.java:71)
	at com.databricks.photon.PhotonWriteResultHandler.$anonfun$getResult$3(PhotonWriteStageExec.scala:121)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.photon.PhotonResultHandler.timeit(PhotonResultHandler.scala:30)
	at com.databricks.photon.PhotonResultHandler.timeit$(PhotonResultHandler.scala:28)
	at com.databricks.photon.PhotonWriteResultHandler.timeit(PhotonWriteStageExec.scala:67)
	at com.databricks.photon.PhotonWriteResultHandler.$anonfun$getResult$2(PhotonWriteStageExec.scala:121)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1542)
	at com.databricks.photon.PhotonWriteResultHandler.getResult(PhotonWriteStageExec.scala:118)
	... 38 more

@kevinwallimann
Copy link
Collaborator

Hi @jelmew
Thanks for reporting the issue. Could you please double-check if you are really using v3.5.0, or maybe the latest master version? I'm looking at https://github.com/apache/spark/blob/master/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L59 and indeed I can see that a new constructor argument useStableIdForUnionType: Boolean was added. However, this is on the master branch of Spark. I don't see that argument in v3.5.0 https://github.com/apache/spark/blob/v3.5.0/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L53-L56, and @hafizmujadidKhalid reported no issues with Spark 3.5.0.

@cerveada See https://issues.apache.org/jira/browse/SPARK-43380. We need to take care of this for the next release of Spark

@jelmew
Copy link

jelmew commented Jan 22, 2024

Hi @kevinwallimann

This is using databricks runtime 14.2 https://docs.databricks.com/en/release-notes/runtime/14.2.html. They might have backported something from master?

Kind regards,
Jelmer

Edit,

Okay, yup. They included it.

image

@sauletawil
Copy link

+1

@kevinwallimann
Copy link
Collaborator

Hi @jelmew
Unfortunately, I don't have access to Spark on Databricks Runtime, but I will reproduce and fix the error by building Spark locally using the latest commit on branch-3.5. Since I've identified a few issues on running the tests for Abris with Spark 3.5.0, I need to fix those first. See #350

@lucafurrer
Copy link

+1

@kevinwallimann
Copy link
Collaborator

Hi @jelmew
I've released v6.4.0 with the bugfix. https://github.com/AbsaOSS/ABRiS/releases/tag/v6.4.0. Please let me know if it fixes your issue.

@jelmew
Copy link

jelmew commented Feb 13, 2024

We are testing.

@roicostas
Copy link

Same problem here. It seems that AvroDeserializer contructor changed a lot recently, adding more params to the constructor. This way:
3.5.0 has 3 parameters
3.5.1 has 4 parameters
current master version has 5 parameters

We are also using Databricks last LTS version 14.3 which in theory uses 3.5.0 but it should have some customizations and it seems AvroDeserializer is already being provided by one of their jars and it has the new version with 5 parameters.
The jar which provides AvroDeserializer is: file:/databricks/jars/----ws_3_5--connector--avro--avro-hive-2.3__hadoop-3.2_2.12_shaded---606136534--avro-unshaded-hive-2.3__hadoop-3.2_2.12_deploy.jar!/org/apache/spark/sql/avro/AvroDeserializer.class

We printed out available contructors and we checked that the available constructor is the one with 5 parameters.

With databricks 13.3 it works fine

@jelmew
Copy link

jelmew commented Mar 26, 2024

I agree on that. Databricks 13.3 works fine, but 14.3 lts had the above problem once again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants