Skip to content

[SUPPORT].UnresolvedUnionException: Not in union exception occurred when writing data through spark #7902

@nbeeee

Description

@nbeeee

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:
Data upsert to hudi through spark
This exception will appear after multiple executions, which seems to trigger the archive operation

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :0.12.1

  • Spark version :3.1.1

  • Hive version :2.7.3

  • Hadoop version :3.1.2

  • Storage (HDFS/S3/GCS..) :HDFS

  • Running on Docker? (yes/no) :no

Additional context

/data/spark3/bin/spark-submit
--master yarn
--name xxxxx
--deploy-mode cluster
--driver-memory 4g
--num-executors 5
--total-executor-cores 6
--executor-memory 12g
--conf spark.memory.fraction=0.85
--conf spark.memory.storageFraction=0.85
--queue default
--class Kudu2Hudi spark_dependence_jar/hudi/hudi.jar

Stacktrace

23/02/07 02:00:01 ERROR HoodieTimelineArchiver: Failed to archive commits, .commit file: 20230201144304071.clean.requested
org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"map","values":{"type":"array","items":{"type":"string","avro.java.string":"String"}},"avro.java.string":"String"}]: KEEP_LATEST_FILE_VERSIONS
	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.serializeRecords(HoodieAvroDataBlock.java:119)
	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:135)
	at org.apache.hudi.client.HoodieTimelineArchiver.writeToFile(HoodieTimelineArchiver.java:671)
	at org.apache.hudi.client.HoodieTimelineArchiver.archive(HoodieTimelineArchiver.java:643)
	at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:171)
	at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:909)
	at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:629)
	at org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:534)
	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:237)
	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:714)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:340)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:144)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
	at Kudu2Hudi$.main(Kudu2Hudi.scala:120)
	at Kudu2Hudi.main(Kudu2Hudi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:732)
23/02/07 02:00:01 ERROR HoodieTimelineArchiver: Failed to archive commits, .commit file: 20230201144304071.clean.inflight.```

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:schemaSchema evolution and data typesengine:sparkSpark integrationpriority:highSignificant impact; potential bugs

    Type

    No type

    Projects

    Status

    ⏳ Awaiting Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions