Skip to content

[SUPPORT] Failed to upsert for commit time #2970

@KarthickAN

Description

@KarthickAN

Hi,
I keep getting the following error intermittently and I'm not sure what causes this issue. There may be two different hudi jobs running parallelly and writing to the same bucket. Will that be an issue ? Also Please guide me in resolving the following error.

py4j.protocol.Py4JJavaError: An error occurred while calling o318.save.
: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210520040253
at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:62)
at org.apache.hudi.table.action.commit.UpsertCommitActionExecutor.execute(UpsertCommitActionExecutor.java:45)
at org.apache.hudi.table.HoodieCopyOnWriteTable.upsert(HoodieCopyOnWriteTable.java:88)
at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193)
at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:384)
at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:139)
at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.execute(BaseCommitActionExecutor.java:89)
at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:55)
... 38 more

Below are my hudi config:::

SmallFileSize = 104857600
MaxFileSize = 125829120
RecordSize = 35
CompressionRatio = 5
InsertSplitSize = 3500000
IndexBloomNumEntries = 1500000
KeyGenClass = org.apache.hudi.keygen.ComplexKeyGenerator
RecordKeyFields = sourceid,sourceassetid,sourceeventid,value,timestamp
TableType = COPY_ON_WRITE
PartitionPathFields = date,sourceid
HiveStylePartitioning = True
WriteOperation = upsert
CompressionCodec = snappy
CommitsRetained = 1
CombineBeforeInsert = True
PrecombineField = timestamp
InsertDropDuplicates = False
InsertShuffleParallelism = 100

Environment Description

Hudi version : 0.6.0

Spark version : 2.4.3

Hadoop version : 2.8.5-amzn-1

Storage (HDFS/S3/GCS..) : S3

Running on Docker? (yes/no) : No. Running on AWS Glue

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:writerWrite client and core write operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions