[HUDI-8519] Fix update with multiple secondary indexes#12259
[HUDI-8519] Fix update with multiple secondary indexes#12259codope wants to merge 2 commits intoapache:masterfrom
Conversation
| LOG.info(msg); | ||
| final List<String> fileGroupFileIds = IntStream.range(0, fileGroupCount) | ||
| .mapToObj(i -> HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i)) | ||
| .mapToObj(i -> HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i, partitionName)) |
There was a problem hiding this comment.
So, this impacts both sec index and functional index as well then ?
There was a problem hiding this comment.
Yes, it impacts both indexes.
| public static String getFileIDForFileGroup(MetadataPartitionType partitionType, int index) { | ||
| public static String getFileIDForFileGroup(MetadataPartitionType partitionType, int index, String partitionName) { | ||
| if (MetadataPartitionType.FUNCTIONAL_INDEX.equals(partitionType) || MetadataPartitionType.SECONDARY_INDEX.equals(partitionType)) { | ||
| return String.format("%s%04d-%d", partitionName.replaceAll("_", "-").concat("-"), index, 0); |
There was a problem hiding this comment.
why do we need the replace for functional and sec index ? can you help me understand
There was a problem hiding this comment.
Issue impacts both functional and secondary index so fix is required for both
| // update the secondary key column after creating multiple secondary indexes | ||
| spark.sql(s"update $tableName set not_record_key_col = 'xyz' where record_key_col = 'row1'") | ||
| // validate the secondary index records themselves | ||
| checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from hudi_metadata('$basePath') where type=7 and key like '%row1'")( |
There was a problem hiding this comment.
can we also validate entire contents of sec index and lets validate the partition path meta field in MDT partition as well (which will ensure diff sec index partitions are validated for its content
| checkAnswer(s"select ts, record_key_col, not_record_key_col, partition_key_col from $tableName where record_key_col = 'row1'")( | ||
| Seq(1, "row1", "xyz", "p1") | ||
| ) | ||
| verifyQueryPredicate(hudiOpts, "not_record_key_col", "abc") |
There was a problem hiding this comment.
can we also do query predicate on "ts" col (the other sec index)
|
can we write tests for functional index as well since the fix is applicable for FI as well |
|
Closing in favor of #12263 |
Change Logs
Multiple secondary indexes (or functional index) exist in different partitions but still we use the same file id prefix. So, thre is a chance of collision in the append handle when two different secondary index have same file id prefix and same shard. This PR fixes the file id prefix in such a case. Added updates to the existing test case which creates multiple secondary index.
Impact
Fix updates with multiple secondary indexes.
Risk level (write none, low medium or high below)
low
only affects sec index and func index.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist