-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Hi,
I keep getting the following error intermittently and I'm not sure what causes this issue. There may be two different hudi jobs running parallelly and writing to the same bucket. Will that be an issue ? Also Please guide me in resolving the following error.
py4j.protocol.Py4JJavaError: An error occurred while calling o318.save.
: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210520040253
at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:62)
at org.apache.hudi.table.action.commit.UpsertCommitActionExecutor.execute(UpsertCommitActionExecutor.java:45)
at org.apache.hudi.table.HoodieCopyOnWriteTable.upsert(HoodieCopyOnWriteTable.java:88)
at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193)
at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:384)
at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:139)
at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.execute(BaseCommitActionExecutor.java:89)
at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:55)
... 38 more
Below are my hudi config:::
SmallFileSize = 104857600
MaxFileSize = 125829120
RecordSize = 35
CompressionRatio = 5
InsertSplitSize = 3500000
IndexBloomNumEntries = 1500000
KeyGenClass = org.apache.hudi.keygen.ComplexKeyGenerator
RecordKeyFields = sourceid,sourceassetid,sourceeventid,value,timestamp
TableType = COPY_ON_WRITE
PartitionPathFields = date,sourceid
HiveStylePartitioning = True
WriteOperation = upsert
CompressionCodec = snappy
CommitsRetained = 1
CombineBeforeInsert = True
PrecombineField = timestamp
InsertDropDuplicates = False
InsertShuffleParallelism = 100
Environment Description
Hudi version : 0.6.0
Spark version : 2.4.3
Hadoop version : 2.8.5-amzn-1
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No. Running on AWS Glue