[SUPPORT] Facing  java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables 

Running inline compaction and cleaning process for MoR tables on EMR - 6.12 (i.e Hudi 0.13) 
Facing NoSuchElementException on a FileID. The specific file is present in S3 but that file is not in any commit timeline.
This is a low frequency table so this table gets an insert record once every month and our cleaner and compactor is configured to be inline. But, the cleaner does not cleanup the file causing the writer to fail.
There is another file on the same path with a different fileID that is present with an updated data.
To resolve this issue we deleted thsi orphaned stale file. But its really not clear why this issue is occuring.

Here is the hoodie.properties file
hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
hoodie.table.type=MERGE_ON_READ
hoodie.table.metadata.partitions=
hoodie.table.precombine.field=source_ts_ms
hoodie.table.partition.fields=
hoodie.archivelog.folder=archived
hoodie.timeline.layout.version=1
hoodie.table.checksum=4106800621
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.timeline.timezone=LOCAL
hoodie.table.name=performer_ride_join_table
hoodie.table.recordkey.fields=performer_id,ride_id
hoodie.compaction.record.merger.strategy=eeb8d96f-b1e4-49fd-bbf8-28ac514178e5
hoodie.datasource.write.hive_style_partitioning=false
hoodie.partition.metafile.use.base.format=false
hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
hoodie.populate.meta.fields=true
hoodie.table.base.file.format=PARQUET
hoodie.database.name=
hoodie.datasource.write.partitionpath.urlencode=false
hoodie.table.version=5


**To Reproduce**

Steps to reproduce the behavior:
This does not happen regularly in the past 3 weeks it happen 2 times - 1 per week

1. Create a Hudi writer with the above config
2. set HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS.key() -> "1",
4. Insert a record

**Expected behavior**

Cleaner should have cleaned up the orphaned file.

**Environment Description**

* Hudi version : 0.13

* Spark version : 3.4.0

* Hive version : 3.1.3

* Hadoop version : 3.3.3

* Storage (HDFS/S3/GCS..) : S3

* Running on Docker? (yes/no) : No


**Additional context**

This is a low frequency table so this table gets an insert record once every month and our cleaner and compactor is configured to be inline. But, the cleaner does not cleanup the file causing the writer to fail.
There is another file on the same path with a different fileID that is present with an updated data.
To resolve this issue we deleted thsi orphaned stale file. But its really not clear why this issue is occuring.

**Stacktrace**

Error:
23/10/09 17:27:39 ERROR Client: Application diagnostics message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 174.0 failed 4 times, most recent failure: Lost task 0.3 in stage 174.0 (TID 30924) (ip-10-11-117-156.ec2.internal executor 28): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:336)
	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:251)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:326)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.NoSuchElementException: FileID 62b95ad5-44df-496c-8309-4d5d270f2ef6-0 of partition path  does not exist.
	at org.apache.hudi.io.HoodieMergeHandle.getLatestBaseFile(HoodieMergeHandle.java:156)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables #9861

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables #9861

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions