[SUPPORT] Spark3.2 encountered duplicate data while reading the hudi bucket MOR table

**Describe the problem you faced**
A few days ago in the production environment, a datanode in the Hadoop cluster downtimed, it causing the flink streaming write task( for hudi bucket mor table) failed.   After restarting the Flink task, when we used Spark3.2 or Presto333 to read data from the table, we found duplicate data under the same primary key ，yet the duplicate records have the same Hudi system field values (_hoodie_commit_time, _hoodie_commit_seqno, _hoodie_filename) .  
  Note: This Flink write task has been running normally for several days，There were no duplicates record before  a datanode  downtimed.
   
![image](https://github.com/apache/hudi/assets/5379274/effff867-d665-4516-8ae9-cb09b2812d47)
![6d83cedd6b4e7b3b21c76493f0836927](https://github.com/apache/hudi/assets/5379274/0a1d4b1c-c710-4115-9e6b-98769309e46b)
![3dcb86d0cd346c65160acc88edb7d8ee](https://github.com/apache/hudi/assets/5379274/2b8f29bd-c878-4add-b739-f9ddaf312aac)


**Environment Description**

* Hudi version :0.13.0 
* Spark version :3.2
* Hive version :3.1
* Hadoop version :3.0
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no 

**Additional context**

Add any other context about the problem here.

**Stacktrace**

```Add the stacktrace of the error.```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Spark3.2 encountered duplicate data while reading the hudi bucket MOR table #9244

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SUPPORT] Spark3.2 encountered duplicate data while reading the hudi bucket MOR table #9244

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions