[HUDI-6822] Fix deletes handling in hbase index when partition path is updated#9630
Conversation
| List<HoodieRecord> recordList = new LinkedList<>(); | ||
| for (HoodieRecordDelegate recordDelegate : writeStatus.getWrittenRecordDelegates()) { | ||
| if (!writeStatus.isErrored(recordDelegate.getHoodieKey())) { | ||
| if (recordDelegate.getIgnoreFlag()) { |
There was a problem hiding this comment.
Why ingores the delete records for old partitions directly?
There was a problem hiding this comment.
Only ignore the deleted record in partition change scene, not all deleted record:
existing record(id1, par1), new record(id1, par2), if updatePartitionPath set to true, will emit delete record to par1 and add new record(id1, par2) to par2, in this scene, the delete record is no need to update index as the new record will override the index
There was a problem hiding this comment.
how do we handle deletes. i.e. if we get deletes for a record in partition p1, when it reaches metadata writer, we might just have 1 recordDelegate but theignore flag will not be set since we are not setting it in any of write handles? and so we should be good.
we are setting the ignore flag only in indexing code and specifically when indexing could reutrn two version of record delegate.
just wanted to confirm my understanding.
There was a problem hiding this comment.
yeah, it's right, we are setting the ignore flag only in indexing code and specifically when indexing could reutrn two version of record delegate.
| List<HoodieRecord> recordList = new LinkedList<>(); | ||
| for (HoodieRecordDelegate recordDelegate : writeStatus.getWrittenRecordDelegates()) { | ||
| if (!writeStatus.isErrored(recordDelegate.getHoodieKey())) { | ||
| if (recordDelegate.getIgnoreFlag()) { |
There was a problem hiding this comment.
how do we handle deletes. i.e. if we get deletes for a record in partition p1, when it reaches metadata writer, we might just have 1 recordDelegate but theignore flag will not be set since we are not setting it in any of write handles? and so we should be good.
we are setting the ignore flag only in indexing code and specifically when indexing could reutrn two version of record delegate.
just wanted to confirm my understanding.
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java
Outdated
Show resolved
Hide resolved
a37708f to
b6faf77
Compare
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
36306f0 to
fe66314
Compare
…s updated (apache#9630) --------- Co-authored-by: Balaji Varadarajan <balaji.varadarajan@robinhood.com>
Change Logs
Similar to #9114, corner case when a record moves from 1 partition to another with partition path update set to true
Impact
hudi/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/hbase/SparkHoodieHBaseIndex.java
Lines 235 to 242 in eab00d5
so add a flag to ignore this index delete operations in partitionPathUpdate condition
Risk level (write none, low medium or high below)
N/A
Documentation Update
N/A
Contributor's checklist