Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests for Spark 3.5.0 #350

Closed
kevinwallimann opened this issue Jan 24, 2024 · 0 comments · Fixed by #351
Closed

Fix tests for Spark 3.5.0 #350

kevinwallimann opened this issue Jan 24, 2024 · 0 comments · Fixed by #351
Assignees

Comments

@kevinwallimann
Copy link
Collaborator

Describe the bug

Executing tests fails with Spark 3.5.0

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. Check out latest master
  2. Change Spark version in pom.xml to 3.5.0
  3. Run mvn clean test (using Java 8)
  4. See errors below
[ERROR] /ABRiS/src/main/scala/za/co/absa/abris/examples/ConfluentKafkaAvroWriter.scala:88: error: value apply is not a member of object org.apache.spark.sql.catalyst.encoders.RowEncoder
[ERROR]     RowEncoder.apply(sparkSchema)
[ERROR]                ^
[ERROR] one error found

This error can be fixed by replacing RowEncoder.apply with org.apache.spark.sql.Encoders.row

The next error is

SchemaLoaderSpec:
SchemaLoader
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/exc/StreamConstraintsException
  at com.fasterxml.jackson.databind.node.JsonNodeFactory.objectNode(JsonNodeFactory.java:353)
  at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:100)
  at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:25)
  at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
  at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4801)
  at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3084)
  at org.apache.avro.Schema$Parser.parse(Schema.java:1430)
  at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
  at all_types.test.NativeSimpleOuter.<clinit>(NativeSimpleOuter.java:18)
  at za.co.absa.abris.examples.data.generation.TestSchemas$.<init>(TestSchemas.scala:35)

This can be fixed e.g. by explicitly setting the jackson-core dependency to version 2.15.2, thereby overriding v2.12.2 that is included by avro 1.10.2. Spark 3.5.0 depends on jackson-databind v2.15.2

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.15.2</version>
        </dependency>

Expected behavior

The tests should run for the current Spark versions 3.2.4, 3.3.3, 3.4.2 and 3.5.0. These versions should be added in the test-and-verify Github action.

Additional context

If you replace RowEncoder.apply with RowEncoder.encoderFor, the following exception is thrown in 18 tests.

- should convert all types of data to confluent avro an back using schema registry for key *** FAILED ***
  org.apache.spark.SparkRuntimeException: Only expression encoders are supported for now.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedEncoderError(QueryExecutionErrors.scala:477)
  at org.apache.spark.sql.catalyst.encoders.package$.encoderFor(package.scala:34)
  at org.apache.spark.sql.catalyst.plans.logical.CatalystSerde$.generateObjAttr(object.scala:47)
  at org.apache.spark.sql.execution.ExternalRDD$.apply(ExistingRDD.scala:35)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:498)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:367)
  at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:236)
  at za.co.absa.abris.avro.sql.CatalystAvroConversionSpec.getTestingDataFrame(CatalystAvroConversionSpec.scala:55)
  at za.co.absa.abris.avro.sql.CatalystAvroConversionSpec.$anonfun$new$23(CatalystAvroConversionSpec.scala:484)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  ...

The fix is to replace RowEncoder.encoderFor by org.apache.spark.sql.Encoders.row, as mentioned in https://issues.apache.org/jira/browse/SPARK-45311

If you get java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x74ad2091) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x74ad2091, run it with Java 8, or add the VM option --add-exports java.base/sun.nio.ch=ALL-UNNAMED

@kevinwallimann kevinwallimann self-assigned this Jan 24, 2024
This was referenced Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant