[SUPPORT] Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files?

**Description:**
The documentation says: Hudi works with Spark-2.x & Spark 3.x versions. (https://hudi.apache.org/docs/quick-start-guide.html) But I have not been able to use hudi-spark-bundle_2.11 version 0.7.0 with Spark 2.3.0 and Scala 2.11.12. Is there any specific spark_avro package one has to use?

I am trying to write a dataframe read from orc file
The job fails with the below error: java.lang.NoSuchMethodError: org.apache.spark.sql.types.Decimal$.minBytesForPrecision()[I Any inputs will be very helpful.

In the cluster I am working with we have Spark 2.3.0 and there is no immediate upgrade planned. Wanted to check if there is there any way to make Hudi 0.7.0 work with Spark 2.3.0?

Note: I am able to use Spark 2.3.0 with hudi-spark-bundle-0.5.0-incubating.jar

**spark-shell Command**
spark-shell \
 --packages org.apache.hudi:hudi-spark-bundle_2.11:0.7.0,org.apache.spark:spark-avro_2.11:2.4.4,org.apache.avro:avro:1.8.2 \
 --conf spark.driver.extraClassPath=/u/users/j0s0j7j/.ivy2/jars/org.apache.avro_avro-1.8.2.jar \
 --conf spark.executor.extraClassPath=/u/users/j0s0j7j/.ivy2/jars/org.apache.avro_avro-1.8.2.jar \
 --conf "spark.sql.hive.convertMetastoreParquet=false" \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'

**Stack Strace**
scala>   val BaseTableIncrLoad = spark.sql(
        s"""select * from $srcSchema.$srcTable
           |where load_time >= ('2021-01-20') and load_time < ('2021-01-21')

val transformedDF = BaseTableIncrLoad.withColumn("partitionpath", concat(lit("part_str_col="), col("part_str_col"), lit("/part_date_col="), col("part_date_col")))

scala> transformedDF.write.format("org.apache.hudi").
         |         options(getQuickstartWriteConfigs).
         |         option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "col1").
         |         //option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col2").
         |         option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "col3,col4,col5").
         |         option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
         |         option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, "org.apache.hudi.ComplexKeyGenerator").
         |         option("hoodie.upsert.shuffle.parallelism","20").
         |         option("hoodie.insert.shuffle.parallelism","20").
         |         option(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES, 128 * 1024 * 1024).
         |         option(HoodieStorageConfig.PARQUET_FILE_MAX_BYTES, 128 * 1024 * 1024).
         |         option(HoodieWriteConfig.TABLE_NAME, "targetTableHudi").
         |         mode(SaveMode.Append).
         |         save(targetPath)
    21/02/22 07:14:03 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
    java.lang.NoSuchMethodError: org.apache.spark.sql.types.Decimal$.minBytesForPrecision()[I
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:156)
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:176)
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:174)
      at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
      at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
      at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
      at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:52)
      at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:139)
      at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
      at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
      at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
      at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
      at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
      at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
      ... 62 elided

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files? #2592

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SUPPORT] Does latest versions of Hudi (0.7.0, 0.6.0) work with Spark 2.3.0 when reading orc files? #2592

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions